Goto

Collaborating Authors

 Gaza Province



The Morning After: Elon Musk's SpaceX is buying his AI company, xAI

Engadget

The Morning After: Elon Musk's SpaceX is buying his AI company, xAI Like some sort of corporate Russian doll, SpaceX has announced its acquisition of xAI. The merger will "form the most ambitious, vertically integrated innovation engine on (and off) Earth," according to, well, owner Elon Musk. The AI company, arguably best known for its ongoing CSAM-generating chatbot controversy, might seem like a strange fit for a rocket company. But SpaceX is apparently key to Musk's latest scheme to build AI data centers in space. There might be an argument for moving the resource-intensive operations to space -- but Musk continued.


California is investigating Grok over AI-generated CSAM and nonconsensual deepfakes

Engadget

Apple's Siri AI will be powered by Gemini Governor Gavin Newsom called for an investigation into xAI. An illustration photo shows Grok logo displayed on a smartphone with the xAI logo in the background . California authorities have launched an investigation into xAI following weeks of reports that the chatbot was generating sexualized images of children. The statement cited a report that more than half of the 20,000 images generated by xAI between Christmas and New Years depicted people in minimal clothing, including some that appeared to be children. We have zero tolerance for the AI-based creation and dissemination of nonconsensual intimate images or of child sexual abuse material," Bonta said. "Today, my office formally announces an investigation into xAI to determine whether and how xAI violated the law.


Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Lin, Justin W., Jones, Eliot Krzysztof, Jasper, Donovan Julian, Ho, Ethan Jun-shen, Wu, Anna, Yang, Arnold Tianyi, Perry, Neil, Zou, Andy, Fredrikson, Matt, Kolter, J. Zico, Liang, Percy, Boneh, Dan, Ho, Daniel E.

arXiv.org Artificial Intelligence

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.


Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting

Kang, James Jin, Bui, Dang, Pham, Thanh, Ling, Huo-Chong

arXiv.org Artificial Intelligence

The growing use of large language models in sensitive domains has exposed a critical weakness: the inability to ensure that private information can be permanently forgotten. Yet these systems still lack reliable mechanisms to guarantee that sensitive information can be permanently removed once it has been used. Retraining from the beginning is prohibitively costly, and existing unlearning methods remain fragmented, difficult to verify, and often vulnerable to recovery. This paper surveys recent research on machine unlearning for LLMs and considers how far current approaches can address these challenges. We review methods for evaluating whether forgetting has occurred, the resilience of unlearned models against adversarial attacks, and mechanisms that can support user trust when model complexity or proprietary limits restrict transparency. Technical solutions such as differential privacy, homomorphic encryption, federated learning, and ephemeral memory are examined alongside institutional safeguards including auditing practices and regulatory frameworks. The review finds steady progress, but robust and verifiable unlearning is still unresolved. Efficient techniques that avoid costly retraining, stronger defenses against adversarial recovery, and governance structures that reinforce accountability are needed if LLMs are to be deployed safely in sensitive applications. By integrating technical and organizational perspectives, this study outlines a pathway toward AI systems that can be required to forget, while maintaining both privacy and public trust.



From Confusion to Clarity: ProtoScore -- A Framework for Evaluating Prototype-Based XAI

Monke, Helena, Sae-Chew, Benjamin, Fresz, Benjamin, Huber, Marco F.

arXiv.org Artificial Intelligence

The complexity and opacity of neural networks (NNs) pose significant challenges, particularly in high-stakes fields such as healthcare, finance, and law, where understanding decision-making processes is crucial. To address these issues, the field of explainable artificial intelligence (XAI) has developed various methods aimed at clarifying AI decision-making, thereby facilitating appropriate trust and validating the fairness of outcomes. Among these methods, prototype-based explanations offer a promising approach that uses representative examples to elucidate model behavior. However, a critical gap exists regarding standardized benchmarks to objectively compare prototype-based XAI methods, especially in the context of time series data. This lack of reliable benchmarks results in subjective evaluations, hindering progress in the field. We aim to establish a robust framework, ProtoScore, for assessing prototype-based XAI methods across different data types with a focus on time series data, facilitating fair and comprehensive evaluations. By integrating the Co-12 properties of Nauta et al., this framework allows for effectively comparing prototype methods against each other and against other XAI methods, ultimately assisting practitioners in selecting appropriate explanation methods while minimizing the costs associated with user studies. All code is publicly available at https://github.com/HelenaM23/ProtoScore .


HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Wang, Zhilin, Zeng, Jiaqi, Delalleau, Olivier, Shin, Hoo-Chang, Soares, Felipe, Bukharin, Alexander, Evans, Ellie, Dong, Yi, Kuchaiev, Oleksii

arXiv.org Artificial Intelligence

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs. Dataset (CC-BY-4.0): https://huggingface.co/datasets/nvidia/HelpSteer3#preference Models (NVIDIA Open Model): https://huggingface.co/collections/nvidia/reward-models-68377c5955575f71fcc7a2a3


On AI Verification in Open RAN

Soundrarajan, Rahul, Fiandrino, Claudio, Polese, Michele, D'Oro, Salvatore, Bonati, Leonardo, Melodia, Tommaso

arXiv.org Artificial Intelligence

Open RAN introduces a flexible, cloud-based architecture for the Radio Access Network (RAN), enabling Artificial Intelligence (AI)/Machine Learning (ML)-driven automation across heterogeneous, multi-vendor deployments. While EXplainable Artificial Intelligence (XAI) helps mitigate the opacity of AI models, explainability alone does not guarantee reliable network operations. In this article, we propose a lightweight verification approach based on interpretable models to validate the behavior of Deep Reinforcement Learning (DRL) agents for RAN slicing and scheduling in Open RAN. Specifically, we use Decision Tree (DT)-based verifiers to perform near-real-time consistency checks at runtime, which would be otherwise unfeasible with computationally expensive state-of-the-art verifiers. We analyze the landscape of XAI and AI verification, propose a scalable architectural integration, and demonstrate feasibility with a DT-based slice-verifier. We also outline future challenges to ensure trustworthy AI adoption in Open RAN.


BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Petrov, Ivo, Dekoninck, Jasper, Vechev, Martin

arXiv.org Artificial Intelligence

Large language models (LLMs) have recently shown strong performance on mathematical benchmarks. At the same time, they are prone to hallucination and sycophancy, often providing convincing but flawed proofs for incorrect mathematical statements provided by users. This significantly limits the applicability of LLMs in theorem proving, as verification of these flawed proofs must be done manually by expert mathematicians. However, existing benchmarks that measure sycophancy in mathematics are limited: they focus solely on final-answer problems, rely on very simple and often contaminated datasets, and construct benchmark samples using synthetic modifications that create ill-posed questions rather than well-posed questions that are demonstrably false. To address these issues, we introduce BrokenMath, the first benchmark for evaluating sycophantic behavior in LLMs within the context of natural language theorem proving. BrokenMath is built from advanced 2025 competition problems, which are perturbed with an LLM to produce false statements and subsequently refined through expert review. Using an LLM-as-a-judge framework, we evaluate state-of-the-art LLMs and agentic systems and find that sycophancy is widespread, with the best model, GPT-5, producing sycophantic answers 29% of the time. We further investigate several mitigation strategies, including test-time interventions and supervised fine-tuning on curated sycophantic examples. These approaches substantially reduce, but do not eliminate, sycophantic behavior.